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Abstract 

A parallel implementation of the Balancing Domain Decomposition by Con- 
straints (BDDC) method is described. It is based on formulation of BDDC 
with global matrices without explicit coarse problem. The implementation is 
based on the MUMPS parallel solver for computing the approximate inverse 
used for preconditioning. It is successfully applied to several problems of 
Stokes flow discretized by Taylor-Hood finite elements and BDDC is shown 
to be a promising method also for this class of problems. 
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1. Introduction 

In many areas of engineering, numerical solution of problems by the finite 
element method (FEM) leads to solution of systems of linear algebraic equa- 
tions with sparse and often ill-conditioned matrices. For very large problems, 
the usual method of choice for their solution is one of the iterative methods 
based on Krylov subspaces. However, without preconditioning, the conver- 
gence rate deteriorates with growing condition number of the problem. The 
need of first-rate preconditioners tailored to the solved problem, which can 
be implemented in parallel, gave rise to the field of domain decomposition 
methods (e.g. pU]). 

The Balancing Domain Decomposition based on Constraints (BDDC) is 
one of the most advanced preconditioners of this class. It was introduced 
by Dohrmann [3] in 2003 and the theory was developed by Mandel and 
Dohrmann in [T2]. In an important contribution to the theory of the precon- 
ditioner [13], Mandel, Dohrmann, and Tezaur proved close connections with 
the earlier FETI-DP method by Farhat et al. [5], another popular domain 
decomposition technique. The preconditioner was reformulated without ex- 
plicit coarse problem as is used in this paper by Li and Widlund in [TT] . The 
underlying theory of the BDDC method covers problems with symmetric 
positive definite matrix. An important application that leads to such kind 
of systems is structural analysis by linear elasticity theory. 

The solution of the incompressible Stokes problem by a mixed finite ele- 
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ment method leads to a saddle point system with symmetric indefinite ma- 
trix. Thus, the standard theory of BDDC does not cover this important 
class of problems. In the first attempt to apply BDDC to the incompressible 
Stokes problem proposed by Li and Widlund [TU], the optimal precondi- 
tioning properties of BDDC were recovered. The approach is based on the 
notion of benign subspaces, which is restricted to using discontinuous pressure 
approximation, and the authors present results for piecewise constant func- 
tions. Moreover, the approach in [TU] requires quite nonstandard constraints 
between subdomains, thus making the implementation more problem specific 
and difficult. 

In this paper, we follow a different approach. We have implemented a par- 
allel version of the BDDC method and verified its performance on a number 
of problems arising from linear elasticity (e.g. [IS]). Here, we investigate the 
applicability of the method and its implementation to the Stokes fiow with 
only minor changes to the source code of the implementation for elasticity 
problems. Although such application is beyond the standard theory of the 
BDDC method, contributive results are obtained. 

It has been known for a long time, that the conjugate gradient method is 
able to reach solution also for many indefinite cases (e.g. [E]), although it 
may fail in general. Our effort is also supported by recent trends of numerical 
linear algebra to investigate and often prefer the use of preconditioned CG 
method (PCG) with a suitable indefinite preconditioner over more robust 
but also more expensive iterative methods for solving indefinite systems such 
as MINRES, BiCG or GMRES [H]. Another reason for which we do not 
switch to the MINRES method [IS], which is suitable for indefinite problems 



3 



(e.g. [5), is the fact that it requires a positive definite preconditioner, while 
BDDC at the presented setting provides an indefinite preconditioner for the 
saddle-point problem. 

Several results for the Stokes fiow in three dimensions are presented. All 
these problems are obtained using mixed discretization by Taylor-Hood finite 
elements or their serendipity version. These elements use piecewise (tri) linear 
pressure approximation, which does not allow the approach via benign spaces 
of [TDj, but are very popular in the computational fiuid dynamics community. 



2. Stokes problem and approximation by mixed FEM 

Let Q be an open bounded domain in filled with an incompressible 

viscous fiuid, and let dQ be its boundary. Isothermal low speed fiow of 
such fiuid is modelled by the following Stokes system of partial differential 
equations 

- uAu + Vp = f in fi, (1) 

-V ■ u = in fi, (2) 

u = g on dflg, (3) 

— z/(Vu)n + pn = on dflh, (4) 

where u denotes the vector of fiow velocity, p denotes the pressure divided by 
the (constant) density, u denotes the kinematic viscosity of the fiuid supposed 
to be constant, f denotes the density of volume forces per mass unit, dQg and 



dilh are two subsets of dil satisfying 9f2 = d^lg U dilh, fJ-R'^idilg fl dilh) = 0, 
n denotes an outer normal vector to the boundary dQ with unit length, and 
g is a given function satisfying J^^ g • n ds = in the case of dQ = dVtg . 



We derive the weak formulation of the Stokes equations ([T])-(|4]) in the 
manner of mixed methods (cf. [7]). Let us consider the vector function space 
V = [^r e [H\Q)f■,^r\9^^ = o} and the set Vg = {^v e [H\Q)f■,^r\en, = 
g|, where H^{Q) is the usual Sobolev space, and the restriction ^r\QQ^ is 
understood in the sense of traces. 

We now introduce a triangulation of the domain Q into Taylor-Hood 
finite elements P2-P1 and/or Q2Q1 (or their serendipity version Q2SQ1), which 
satisfy the Babuska-Brezzi stability condition (cf. [2]). Their application 
leads to the finite dimensional subsets Vgh G Vg, Vh C V, which contain 
continuous piecewise quadratic functions, and Qh C ^2(^2) with continuous 
piecewise linear functions. 

We can now introduce the discrete Stokes problem: 
Find Uh G Vgh and ph G Qh satisfying 



u / Vuh : Vvftdil - / phV ■ Vftdl] 
Iq Jn 



n 



- / iph"^ ■ UhdQ ■■ 
In 

u/i - Ugh E Vh. 



f ■ v/,dfi, Vvft e Vh, 
0, ytph e Qh, 



(5) 
(6) 



Here Ugh E Vgh represents the Dirichlet boundary condition g in (|3]). 

Expressing the finite element functions as linear combinations of basis 
functions (see e.g. [U Section 5.3] for more details), the problem finally leads 
to the saddle point system of algebraic equations 



(7) 



A 






u 




f 


B 







P 








where u denotes velocity unknowns, p denotes pressure unknowns, A and B 
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are called vector-Laplacian matrix and the divergence matrix, respectively, 
and f is the discrete vector of intensity of volume forces per mass unit. 

3. Iterative substructuring 

In this section, we recall ideas of iterative substructuring used in our 
implementation. Details can be found in [20]. 

Let f2 be a bounded domain in or M^, let U he a finite element space 
of piecewise polynomial functions v continuous on Q and U' its dual space. 
Let a(-, ■) be a bilinear form on U x U and f & U' , and let (-, ■) denote the 
duality pairing of U' and U. Consider now an abstract variational problem: 
Find u E U such that 

a{u,v) = {f,v} \/veU. (8) 
Write the matrix problem corresponding to ([s]) as 

Au = f. (9) 

The domain Q is decomposed into nonoverlapping subdomains fli, i = 
1,...,N, with characteristic size H, which form a conforming triangulation 
of the domain Q. Each subdomain is a union of several finite elements of 
the underlying mesh with characteristic mesh size h, i.e. nodes of the finite 
elements between subdomains coincide. Unknowns common to at least two 
subdomains are called boundary unknowns and the union of all boundary 
unknowns is called the interface F. 

The problem is first reduced to the interface F. For this purpose, the 
solution u is split into the interior solution Uo, with zero values at F, and Mr, 
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where values in subdomain interiors are determined by values at F (see ( 12 ) 
below). Then problem ^ may be rewritten as 



A{ur + Uo) = f. 



(10) 



Let us formally reorder unknowns of problem (10) into two blocks, with the 
first block (subscript i) corresponding to unknowns in subdomain interiors, 
and the second block (subscript 2) corresponding to unknowns at the inter- 



face. This results in the block form of the system (10) given as 



An 


A12 




un + Uoi 




fl 


A21 


A22 




UV2 + Uo2 







(11) 



with Uo2 = by definition. Function u-p is called discrete harmonic, by which 
we mean that it is fully determined by values at interface F and by the 
algebraic condition 



AiiUn + A12UT2 = 0, i.e. AuUn = -Ai2Uy2- 



(12) 



By this splitting, we derive that (11) is equivalent to 



AiiUoi = f 



1) 



An 


A12 











A21 


A22 




Ut2 




/2 - A21U0I 



(13) 
(14) 



and the solution is obtained as m = Mr + Uo- Problem (14) can be further 
split into two problems 

AiiUri = -Ai2Ur2, (15) 
Sur2 = 92, (16) 
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where S is the Schur complement with respect to unknowns at interface F 
defined as 5* = A22 — A2iA^i A12, and g2 is the condensed right hand side 
92 = f2 — A21U01 = f2 — ^2i^r//i- Since An has a block diagonal structure, 



the solution to (13) may be found in parallel and similarly the solution to 



(15). We are ready to recall the algorithm of substructuring. 



Algorithm 1 (Iterative substructuring). Problem is solved in the 
following steps: 



1. factorize block diagonal matrix An in (13) and store factors, 



2. solve (13) by back- substitution to find Uoi, 



3. construct g2 as g2 = f2 — ^21^01? 



4. solve problem (16) by a Krylov subspace method. In each iteration, 
multiplication of a given vector p2 by S is realized as 

• find pi by solution of Anpi = —A12P2, 

• get Sp2 as Sp2 = A21P1 + ^22^2- 



5. Find un by (15), 







+ 


Uol 


u as u = 








UV2 








Note, that the Schur complement 5* is never formed explicitly and its action 
is realized by three sparse matrix multiplications and one back-substitution. 
The main reason for using Algorithm [l] is usually much faster convergence 
of the iterative method for problem (16) compared to problem (|9]) (see e.g. 
EQI). 
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4. BDDC preconditioner 



The BDDC method provides a preconditioner for problem (16). Let W, 



be the space of finite element functions on subdomain Qi, coefficients of 



which satisfy the algebraic discrete harmonic condition (12) locally on the 
subdomain, and put W = Wi x ■ ■ ■ x Wn- It is the space where subdomains 
are completely disconnected at the interface F, and functions on them are 
independent of each other. Let us further define f/r C f/ as the subset of 
finite element functions on Q with coefficients satisfying the discrete harmonic 



condition (12). Clearly, Ur C W, and the solution to (14) -ur G Ur- 

The main idea of the BDDC preconditioner in the abstract form [H] is to 
construct an auxiliary finite dimensional space W such that Ur C W G W, 
and extend the bilinear form a{-,-) to a form a(-,-) defined on W x W, 
such that solving the variational problem ([8| with a (■, ■) in place of a (-, ■) is 



cheaper and can be split into independent computations performed in paral- 



lel. Then the solution restricted to Ur is used for the preconditioning of (16). 
Space W contains functions generally discontinuous at interface F except a 
small set of coarse degrees of freedom at which continuity is preserved. Coarse 
degrees of freedom are typically values at selected nodes called corners. In 
addition, continuity of generalized degrees of freedom, such as averages over 
subdomain edges and/or faces, might be enforced. 

In computation, the corresponding matrix denoted A is used. It is larger 
than the original matrix of the problem A, but it possesses a simpler structure 
suitable for direct solution methods. This is the reason why it can be used 
as a preconditioner. In the presented algorithm, matrix A is constructed 
using the standard FEM assembly procedure on a virtual mesh which is 
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disconnected at interface outside corners (Figure [T]). 



4» <r= 



Figure 1: Example of an actual computational mesh with six subdomains (left) and 
corresponding virtual mesh used for assembly of matrix A (right). 



The projection E : W ^ Ur is realized as a weighted average of values 
from different subdomains at unknowns on the interface F, thus resulting in 
functions continuous across the interface. The weights at a degree of freedom 
are chosen as the inverse to the number of subdomains, in which the degree 
of freedom is contained, as is done e.g. in ^0\. This approach is used for 
both velocity and pressure unknowns. 

Let r G f/f be the residual in an iteration of an iterative method. The 
BDDC preconditioner Mbddc '■ Uy in the abstract form (see |14j ) 

produces the preconditioned residual f G t/r as 

Mbddc : r ^ v = Ew, 

where w E W is obtained as the solution to problem 

w eW ■.a{w,z) = {r,Ez) eW, (17) 

or in terms of matrices as 

V = EA'^E^r. (18) 
10 



Here, the action of is performed as a back-substitution by a direct solver. 
After V is found, we are typically interested only in its values at interface 
nodes, since multiplication of this vector by S follows and interior values are 
resolved from discrete harmonic constraint (12) as described in Algorithm [l] 
For the Stokes problem, we adopt the following slightly unusual notation 
in (§ 

a{u,v) = u Vuh■.V^rhdn- / phV-Vhdn- / ■ UhdQ, (19) 
Jq Jn Jn 

{f,v) = [ f -v.dfi. (20) 
Jn 

The bilinear form a{u,v) is symmetric but indefinite [H H]. In ([o]), this 
corresponds to putting 



A 



Should we distinguish between blocks corresponding to nodes in interiors 
of subdomains and at the interface T as in (11), saddle point problem ([T]) 
would look as 



A 


B^ ' 




u 


,/ = 


f 






,u = 






B 







p 








An 


A12 


BI, 


Bli 




Ul 




fl 


A21 


A22 


B12 


B22 




U2 




f2 


Bu 


B12 










Pi 







B21 


B22 










P2 








(21) 



and the Schur complement matrix and the condensed right hand side in ( 16 ) 
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as 
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B22 







B21 







Bu 







B12 
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fl 
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B21 
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5. Implementation and numerical results 

Our parallel implementation of the BDDC preconditioner has been exten- 
sively tested on problems with symmetric positive definite matrices arising 
from linear elasticity (e.g. [12]). The current version is based on the multi- 
frontal massively parallel sparse direct solver MUMPS [1] version 4.9.2, which 



is used for the factorization of the matrices A in (18) and An in (15). These 
matrices are put into MUMPS in the distributed format with one subdo- 
main corresponding to one processor. While Cholesky factorization is used 
for problems with symmetric positive definite system matrices, for the Stokes 
problem, MUMPS is simply switched to the LDL^ factorization of general 
symmetric matrices. If additional constraints on averages over edges or faces 
are prescribed, the generalized change of variables is used in combination 
with nullspace projection. This approach, which provides a generalization 
of the change of basis from fH], is described in detail in [T^. Iterations are 
performed by a parallel PCG solver. 

The applicability of the preconditioner to the steady problem of Stokes 
fiow has been tested, and results are presented in this section. The system 
matrix of the Stokes problem is symmetric, but indefinite. For this reason, the 
standard theory of BDDC does not cover this case. A way to assure positive 
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definiteness of the preconditioned operator based on BDDC was presented 
by Li and Widlund jlO]. However, that approach is hmited to discontinuous 
pressure approximation, and thus it can be used for neither Q2Q1 Taylor- 
Hood finite elements (e.g. P]) used in our Matlab computations nor their 
serendipity version Q2SQ1 used in our parallel computations. 

5.1. Problem (1) 

The method is first tested on the problem of the lid driven cavity. This 
popular 2D benchmark problem is used in the 3D setting as a section of 
an infinite cavity (Figure [2] left) used e.g. in [6J: The domain is a unit cube 
with unit velocity in the direction of the x-axis on the upper face (called 
lid), zero normal component of velocity {uz = 0) prescribed on faces parallel 
to xy-plane, and homogeneous Dirichlet boundary conditions for velocity on 
remaining faces. In numerical solution, all nodes with y = I are included 
into the lid, which means that the setting corresponds to the so called leaky 
cavity We fix pressure at the node in the centre of the domain to make 
its solution unique. The entire motion inside cavity is driven by viscosity of 
the fluid which is chosen as 0.01. 

The problem is uniformly discretized using Q2SQ1 serendipity finite ele- 
ments (velocity unknowns at vertices and edge-centres, pressure unknowns 
at vertices). 

Tables [T]-[3] contain results for variable number of subdomains (columns) 
and variable H/h ratio, where H stands for the characteristic size of a sub- 
domain and h denotes the characteristic size of an element. Each column 
contains results for constraints at corners only ('c') and with additional con- 
straints on averages over edges ('c-|-e'). Results are summarised with respect 
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Figure 2: 3D lid driven cavity problems: section of infinite cavity (Problem f) (left) - 
unit velocity aligned with x-axis prescribed on the upper (dark grey) face, zero compo- 
nent of velocity in z direction on two (light grey) faces, homogeneous Dirichlet boundary 
conditions for velocity on remaining (white) faces; cubic cavity (Problem 2) (right) - unit 
velocity parallel to the xz-plane rotated along y-axis prescribed on the upper (dark grey) 
face, homogeneous Dirichlet boundary condition for velocity on remaining (white) faces. 

to the number of PCG iterations and computational times of individual dom- 
inant operations in the preconditioner (the total time includes also time for 
parallel factorization of the block of interior unknowns). Computations were 
performed on SGI Altix 4700 computer in CTU Supercomputing Centre in 
Prague with 72 1.5 GHz Intel Itanium 2 processors. The stopping criterion 
for PCG was defined by relative residual as 11^112/11(7112 < 10^^. 

Resulting times are for selected settings compared to those by direct ap- 
plication of the MUMPS solver [Ij and to results by our in-house serial 
direct solver. The latter is based on the unsymmetric frontal method by 
Hood [S] , which is a generalization of the classical frontal method developed 
for symmetric positive definite problems by Irons [9J. The basic idea of the 
frontal approach lies in simultaneous assembling and eliminating of rows of 
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the system matrix finding the factors 'out-of-core'. For suitably numbered 
elements, this approach usually results in huge reduction of memory require- 
ments, while it might have a negative impact on the speed due to the I/O 
operations inside factorization. This solver has been successfully used by 
our group to solve a number of benchmark as well as real-life Stokes and 
Navier-Stokes problems. 

Solution of this cavity problem using 16x16x16 elements (corresponding 
to the last column of Table [l] or the first column of Table |2]) takes 30 minutes 
by our frontal solver on a single processor. 

Table|3]also summarises solution of the problem using 32 x 32 x 32 elements 
divided into 8 subdomains by the MUMPS solver. We have not been able to 
fit this problem into memory using the serial frontal solver. 

An example of the problem for 32x32x32 = 32,768 elements and H/h = 8 
(64 subdomains) is presented in Figure [3] left. In Figure |3] right, several 
streamlines at the z = 0.5 plane are presented. These are coloured by the 
velocity magnitude. 

Tables [T]-[3] reveal an unfortunate property of the presented approach, 
that scalability is not achieved with respect to number of resulting iterations. 
Computational times are growing with growing number of subdomains not 
only due to increasing number of iterations, but also due to the dependence 
on the MUMPS solver, which turns out not to scale well for this problem. 

Nevertheless, it is still interesting to compare the results to the frontal 
solver used to address the Stokes and Navier-Stokes problems by our group 
before, and even to compare the computational time of solution by PCG 
with BDDC preconditioner and by MUMPS (Table [3]) on eight processors. 
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Figure 3: 3D lid driven cavity problem with 32x32x32 elements, H/h = 8: division into 
64 regular subdomains (left); several streamlines at the z — 0.5 plane for Problem (1) 
(right), colours by velocity magnitude. 



subdomains (processors) 
unknowns 


8 

8,748 


27 
27,040 


64 
61,268 


constraints 


c / c+e 


c / c+e 


c / c+e 


number of PCG iterations 


18 / 15 


29 / 17 


37 / 18 


analysis by MUMPS (sec) 
factorization by MUMPS (sec) 
PCG iterations (sec) 


0.9 / 0.5 
0.2 / 0.2 
2.1 / 1.8 


1.9 / 2.1 
0.3 / 0.3 
12 / 7.1 


5.7 / 6.6 
0.5 / 0.6 

46 / 23 


one PCG iteration (sec) 


0.12 / 0.12 


0.42 / 0.42 


1.24 / 1.26 


total wall time (sec) 


5.6 / 4.9 


24 / 22 


106 / 87 



Table 1: Scaling of cavity problem (1) for variable number of subdomains and types of 
constraints, H/h = 4. 
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subdomains (processors) 


8 


27 


64 


iiiiknowijs 


61.268 


197.500 


157.380 


constraints 


c / c+e 


c / c+e 


c / c+e 


number of PCG iterations 


19 / 16 


35 / 18 


122 / 44 


analysis by MUMPS (sec) 


11 / 11 


26 / 27 


84 / 89 


factorization by MUMPS (sec) 


1.3 / 1.4 


1.9 / 2.1 


3.0 / 3.0 


PCG iterations (sec) 


9.7 / 8.4 


70 / 37 


724 / 281 


one PCG iteration (sec) 


0.51 / 0.52 


2.01 / 2.08 


5.93 / 6.39 


total wall time (sec) 


42 / 41 


195 / 182 


1,352 / 966 



Table 2: Scaling of cavity problem (1) for variable number of subdomains and types of 
constraints, H/h = 8. 



subdomains (processors) 


8 




unknowns 


457,380 


solver 


BDDC + PCG 


MUMPS 


constraints 


c / c+e 


n/a 


number of PCG iterations 


45 / 36 


n/a 


analysis by MUMPS (sec) 


168 / 166 


185 


factorization by MUMPS (sec) 


54 / 55 


1,398 


PCG iterations (sec) 


125 / 110 


n/a 


one PCG iteration (sec) 


2.78 / 3.05 


n/a 


total wall time (sec) 


634 / 651 


1,601 



Table 3: Cavity problem (1): BDDC for variable types of constraints, and direct solution 
by MUMPS solver, H/h = 16. 
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for which the former is two times faster. This result supports the initial 
idea of the implementation - that using the parallel direct solver for the 
disconnected problem as a preconditioner for an iterative method can be 
much faster than using the same solver for the original problem directly. 

5.2. Problem (2) 

The next problem is formed by another generalization of the 2D cavity 
problem into 3D inspired by It has the following setting (Figure |2]right): 
Homogeneous Dirichlet boundary conditions for velocity are considered on 
all faces except the lid, where unit tangential velocity is prescribed. However, 
to make also the nature of the flow three-dimensional, the velocity is now not 
aligned with the x-axis but rotated by angle vr/S. Again, pressure is fixed at 
the node in the centre of the domain and the viscosity of the fluid is chosen 
as 0.01. 

This problem is uniformly discretized using (full) Q2Q1 Taylor-Hood fi- 
nite elements. Serendipity Q2SQ1 elements cannot be used for this problem 
since they fail to determine pressure at the eight corners of the domain if 
Dirichlet boundary conditions on velocity are prescribed on all three faces at 
a corner. 

Tables |4]-[5] summarise performance of the BDDC preconditioner in con- 
nection with GMRES and BiCGStab methods, respectively. Since these ex- 
periments were run in Matlab by a serial implementation of the method, the 
comparison is done only for number of iterations and times are not presented. 
The stopping criterion was defined by relative residual as ||r||2/||5'||2 < lO"*^. 

Table |4] presents results for variable size of the problem and variable con- 
straints with fixed ratio H/h = 4. Columns contain results for corners only 
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('c'), with additional constraints on averages over edges ('c+e'), with addi- 
tional constraints on averages over faces ('c+f'), and with both ('c+e+f'). 
This experiment shows, that using averages on edges and faces, number of 
iterations for larger problem does not grow with problem size. This is not 
achieved with corner constraints only, and even with averages on edges or 
faces, number of iterations slightly grows. For reference, number of iterations 
without preconditioning ('no prec.') is also reported. 

Table [s] presents results for fixed number of subdomains (eight) with 
variable H/h ratio. This experiment shows, that number of iterations mildly 
grows for all types of constraints with growing H/h, in agreement with avail- 
able theory for SPD problems. 

Numbers of iterations obtained for the same divisions, but using Matlab 
implementation of ILU preconditioner with variable threshold for dropping 
entries (ILUT [18j) are presented in Tables [6] and [7j These tables present 
results with respect to variable threshold for dropping entries in factors rang- 
ing from 10^'^ to 10~^. Where ('-') occurs, PCG method fails to converge. 
We can conclude, that the ILUT preconditioner with threshold 10~^ is very 
efficient and seems to reach independence of size of this problem. These 
tables also suggest, that with an improving preconditioner, PCG method 
tends to converge also for these indefinite problems. Numbers of iterations 
for threshold 10~^ are comparable with BDDC with sufficient constraints. 

5.3. Problem (3) 

The last problem is inspired by flow in artiflcial arteries. The geometry is 
simplifled to a tube with a sudden reduction of diameter. Due to the symme- 
try of the tube, only one quarter is considered in the computation. Constant 
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subdomains (unknowns) 


BDDC 

c c+e c+f c+e+f 


no prec. 


8 (15,468) 
27 (49,072) 

64 (112,724) 
125 (216,024) 


31/24.5 28/24.5 26/22.5 23/19.5 
50/74.5 35/31.5 31/33.5 26/23 
75/126.5 42/46.5 34/27.5 27/44.5 
115/182.5 47/41.5 35/28.5 27/42.5 


223/608.5 
436/731.5 

627/1,074 
782/1,168.5 



Table 4: Number of iterations by GMRES/BiCGStab with BDDC preconditioner for 
variable type of constraints, and without preconditioning, variable number of subdomains, 
H/h = 4. 



H/h (unknowns) 


BDDC 

c c+e c+f c+e+f 


no prec. 


2 (2,312) 
4 (15,468) 
8 (112,724) 
12 (368,572) 


26/20 22/19.5 22/17.5 19/15.5 
31/24.5 28/24.5 26/22.5 23/19.5 

38/44 33/26.5 29/63.5 26/22.5 
42/35.5 37/31.5 32/173.5 29/98.5 


92/372.5 
223/608.5 
418/690.5 
530/725.5 



Tabic 5: Number of iterations by GMRES/BiCGStab with BDDC preconditioner for 
variable type of constraints, and without preconditioning, variable H/h, 2x2x2 = 8 sub- 
domains. 
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subdomains (unknowns) 


ILUT threshold 
10-3 10-^ 10-^ 


8 (15,468) 
27 (49,072) 
64 (112,724) 

125 (216.021) 


15/11/- 5/3.5/- 3/2/3 
32/40.5/- 9/5.5/- 5/2.5/5 
49/97.5/- 15/11.5/- 6/3.5/20 

57/138/ 23/15.5/ 6/1/ 



Tabic 6: Number of iterations by GMRES/BiCGStab/PCG with ILUT preconditioner for 
variable threshold, variable number of subdomains, H/h — 4. 



H/h (unknowns) 


ILUT threshold 
10-3 10-^ 10-^ 


2 (2,312) 
4 (15,468) 
8 (112,724) 
12 (368,572) 


6/4/11 3/1.5/4 3/1.5/3 
15/11/- 5/3.5/- 3/2/3 

43/66.5/- 15/10.5/- 5/3.5/10 
44/79.5/- 29/26/- out of memory 



Table 7: Number of iterations by GMRES/BiCGStab/PCG with ILUT preconditioner for 
variable threshold, variable H/h, 2x2x2 = 8 subdomains. 
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kinematic viscosity v = 0.01 is considered. Parabolic velocity profile with 
unit mean value is prescribed at the inlet, and 'do-nothing' boundary condi- 
tion ^ at the outlet. The diameter of the tube at the inlet is 0.025 and at 
the narrow part 0.019. The mesh consists of 3,393 Q2sQi finite elements with 
54,248 unknowns. It was divided into 4 subdomains by METIS (Figure |4]). 
Solution of this rather small problem with only corner constraints requires 
33 PCG iterations and takes 30 seconds, which is comparable to 133 seconds 
for solution by serial frontal solver, but now obtained in parallel. Applica- 
tion of averages on faces does not reduce the number of iterations while the 
solution takes 40 seconds due to the overhead of transforming the matrix. 
Streamlines and pressure contours are plotted in Figure |5| 




Figure 4: Mesh of the tube of Problem (3) divided into 4 subdomains. 




Figure 5: Solution of Problem (3): streamlines coloured by velocity magnitude (left) and 
pressure contours (right). 
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6. Conclusion 

In our contribution, we present a straightforward parallel implementation 
of the BDDC preconditioner based on its global formulation and built on top 
of the parallel direct solver MUMPS. After a verification of the solver on a 
number of problems from linear elasticity analysis, we explore the application 
of BDDC to problems with indefinite matrices, namely the Stokes problem. 
Although the available theory either does not cover this case, or treats it 
differently [TOl |2T], the presented experiments suggest promising ways for 
this effort. Results for two versions of the benchmark problem of the lid 
driven cavity and for a real-life problem are presented. These results show 
that the BDDC preconditioner is applicable to the Stokes flow and may speed 
up the solution considerably. 

Without claiming that this is the general case, we have performed sev- 
eral experiments, for which the current parallel implementation based on the 
PCG method is successfully used even though the system matrix is indefinite. 
The reason why a breakdown was not observed lies probably in the indefinite- 
ness of the BDDC preconditioner. Although solution times present a large 
advancement compared to the method previously used for these problems 
by our group, the experiments reveal that optimal scalability is not achieved 
for the PCG method neither with respect to number of iterations, nor with 
respect to computational times. 

On the other hand, our Matlab experiments combining the BDDC pre- 
conditioner with GMRES and BiCGStab methods suggest, that for suitably 
chosen constraints, optimal behaviour can be achieved with respect to grow- 
ing number of subdomains. 
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